Bayesian constrained frequency warping HMMS for speaker normalisation
نویسندگان
چکیده
This paper presents a Bayesian constrained frequency warping technique. The Bayesian approach provides for inclusion of the prior information of the frequency warping parameter and for adjusting the search range in order to obtain the best warping factor dependent on HMMs. We introduce novel frequency warping (FWP) HMMs which are different warped versions of HMMs. Instead of frequency warping of the input speech we warp the spectrum of the HMMs. This is equivalent to HMMs which have both time and frequency warping capabilities. Experimentally FWP HMMs outperform the conventional constrained frequency warping approach. Furthermore, the best warping factor is estimated in two stages, a coarse stage followed by a fine stage. This method efficiently gauges the optimal warping factor and normalises the FWP HMMs.
منابع مشابه
Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
Speaker adaptation in speech synthesis transforms a source utterance to a target utterance that differs from the source in terms of voice characteristics. In this paper, we employ vocal tract length normalization, which is generally used in speech recognition to remove individual speaker characteristics, to speaker adaptation in speech synthesis. We propose a frequency warping approach based on...
متن کاملWavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models
To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral...
متن کاملApplying vocal tract length normalization to meeting recordings
Vocal Tract Length Normalisation (VTLN) is a commonly used technique to normalise for inter-speaker variability. It is based on the speaker-specifi c warping of the frequency axis, parameterised by a scalar warp factor. This factor is typically estimated using maximum likelihood. We discuss how VTLN may be applied to multiparty conversations, reporting a substantial decrease in word error rate ...
متن کاملOn combining vocal tract length normalisation and speaker adaptation for noise robust speech recognition
This paper investigates the combination of vocal tract length normalisation and speaker adaptation in connected digit recognition. In particular, we focus on performing this task under a continuously varying car noise environment. Continuous supervised speaker and environment adaptation is carried out on the test data according to the Bayesian framework. The paper also evaluates various approac...
متن کاملOn Combining Vocal Tract Length Normalisation and Speaker Adapation for Noise Robust Speech Recognition
This paper investigates the combination of vocal tract length normalisation and speaker adaptation in connected digit recognition. In particular, we focus on performing this task under a continuously varying car noise environment. Continuous supervised speaker and environment adaptation is carried out on the test data according to the Bayesian framework. The paper also evaluates various approac...
متن کامل